{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e352c08c",
   "metadata": {},
   "source": [
    "# NVIDIA API Catalog, LlamaIndex, and LangChain\n",
    "\n",
    "This notebook demonstrates how to plug in a NVIDIA API Catalog [**ai-mixtral-8x7b-instruct as LLM**](https://build.nvidia.com/mistralai/mixtral-8x7b-instruct) and [**embedding ai-embed-qa-4**](https://python.langchain.com/docs/integrations/text_embedding/nvidia_ai_endpoints#setup), bind these into [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/) with these customizations.\n",
    "\n",
    "\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "    \n",
    "⚠️ There are continous development and retrieval techniques supported in LlamaIndex and this notebook just shows to quikcly replace components such as llm and embedding to a user-choice, read more [documentation on llama-index](https://docs.llamaindex.ai/en/stable/) for the latest information. \n",
    "</div>\n",
    "\n",
    "### Prerequisite \n",
    "In order to successfully run this notebook, you will need the following -\n",
    "\n",
    "1. Already successfully gone through the [setup](https://python.langchain.com/docs/integrations/text_embedding/nvidia_ai_endpoints#setup) and generated an API key.\n",
    "\n",
    "2. Please verify you have successfully pip install all python packages in [requirements.txt](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/notebooks/requirements.txt)\n",
    "\n",
    "In this notebook, we will cover the following custom plug-in components -\n",
    "\n",
    "    - LLM using ai-mixtral-8x7b-instruct\n",
    "    \n",
    "    - A embedding ai-embed-qa-4\n",
    "    \n",
    "Note: As one can see, since we are using NVIDIA API Catalog as an API, there is no further requirement in the prerequisites about GPUs as compute hardware\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a7632f9",
   "metadata": {},
   "source": [
    "---\n",
    "### Step 1 - Load [ai-mixtral-8x7b-instruct as LLM](https://build.nvidia.com/mistralai/mixtral-8x7b-instruct)\n",
    "\n",
    "Note: check the prerequisite if you have not yet obtain a valid API key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01093277",
   "metadata": {},
   "outputs": [],
   "source": [
    "import getpass\n",
    "import os\n",
    "\n",
    "\n",
    "# del os.environ['NVIDIA_API_KEY']  ## delete key and reset\n",
    "if os.environ.get(\"NVIDIA_API_KEY\", \"\").startswith(\"nvapi-\"):\n",
    "    print(\"Valid NVIDIA_API_KEY already in environment. Delete to reset\")\n",
    "else:\n",
    "    nvapi_key = getpass.getpass(\"NVAPI Key (starts with nvapi-): \")\n",
    "    assert nvapi_key.startswith(\"nvapi-\"), f\"{nvapi_key[:5]}... is not a valid key\"\n",
    "    os.environ[\"NVIDIA_API_KEY\"] = nvapi_key"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc184f10",
   "metadata": {},
   "source": [
    "run a test and see the model generating output response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "618360bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# test run and see that you can genreate a respond successfully\n",
    "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
    "llm = ChatNVIDIA(model=\"ai-mixtral-8x7b-instruct\", nvidia_api_key=nvapi_key, max_tokens=1024)\n",
    "result = llm.invoke(\"Write a ballad about LangChain.\")\n",
    "print(result.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86cbcf3b",
   "metadata": {},
   "source": [
    "### Step 2 - Load the chosen an embedding into llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "607bdfd7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create and dl embeddings instance wrapping huggingface embedding into langchain embedding\n",
    "# Bring in embeddings wrapper\n",
    "from llama_index.embeddings import LangchainEmbedding\n",
    "\n",
    "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings\n",
    "nv_embedding = NVIDIAEmbeddings(model=\"ai-embed-qa-4\")\n",
    "li_embedding=LangchainEmbedding(nv_embedding)\n",
    "# Alternatively, if you want to specify whether it will use the query or passage type\n",
    "# embedder = NVIDIAEmbeddings(model=\"nvolveqa_40k\", model_type=\"passage\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc340ec1",
   "metadata": {},
   "source": [
    "Note: if you encounter typing_extension error, simply reinstall via :pip install typing_extensions==4.7.1 --force-reinstall"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1484e5d5",
   "metadata": {},
   "source": [
    "### Step 3 - Wrap the NVIDIA embedding and the NVIDIA ai-mixtral-8x7b-instruct model as the main llm into llama-index's ServiceContext"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a5da1d7a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Bring in stuff to change service context\n",
    "from llama_index import set_global_service_context\n",
    "from llama_index import ServiceContext\n",
    "\n",
    "# Create new service context instance\n",
    "service_context = ServiceContext.from_defaults(\n",
    "    chunk_size=1024,\n",
    "    llm=llm,\n",
    "    embed_model=li_embedding\n",
    ")\n",
    "# And set the service context\n",
    "set_global_service_context(service_context)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0990b50",
   "metadata": {},
   "source": [
    "### Step 4a - Load the text data using llama-index's SimpleDirectoryReader and we will be using the built-in [VectorStoreIndex](https://docs.llamaindex.ai/en/latest/community/integrations/vector_stores.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47a17ce2",
   "metadata": {},
   "outputs": [],
   "source": [
    "#create query engine with cross encoder reranker\n",
    "from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext\n",
    "import torch\n",
    "\n",
    "documents = SimpleDirectoryReader(\"./toy_data\").load_data()\n",
    "index = VectorStoreIndex.from_documents(documents, service_context=service_context)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7f3d2ebf",
   "metadata": {},
   "source": [
    "### Step 4b - This will serve as the query engine for us to ask questions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1d3827d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Setup index query engine using LLM\n",
    "query_engine = index.as_query_engine()\n",
    "\n",
    "# Test out a query in natural\n",
    "response = query_engine.retrieve(\"Who is the director of the movie Titanic?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "569d92f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "for item in response:\n",
    "    print(f\"retrieved text {item.get_text()} , with score :{ item.get_score()}\")\n",
    "    print(\"---\"*10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f86c9429",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
